🤖 Transformer Architecture - hussoster · Scour

How Transformer Architecture Powers LLMs

dev.to·4h·

Discuss: DEV

🔄Sequence-to-Sequence Models

Interpretable Vision Transformers in Image Classification via SVDA

arxiv.org·12h

🗄️Vector Databases

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

chipublib.idm.oclc.org·1d

🧠Neural Network Architectures

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

arxiv.org·1d

🔄Sequence-to-Sequence Models

An uncertainty aware transformer framework for wind power forecasting with multiscale attention and adaptive feature fusion

chipublib.idm.oclc.org·1d

📈Time Series Forecasting

Cuentos: A Large-Scale Eye-Tracking Reading Corpus on Spanish Narrative Texts

nature.com·11h

🔄Sequence-to-Sequence Models

How Andrej Karpathy Built a Working Transformer in 243 Lines of Code

analyticsvidhya.com·4h

🚀Model Deployment

Beyond Kuramoto Models: Associative Memory and Plastic Synapses in ML Ensembles

hackernoon.com·1d

🧠Neural Network Architectures

The 4 Mixture of Experts Architectures: How to Train 100B Models at 10B Cost

pub.towardsai.net

·4h

🧠Deep Learning

Carnegie Mellon at NeurIPS 2025

blog.ml.cmu.edu·1d

🧠Deep Learning

Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

nature.com·15h

🔄Sequence-to-Sequence Models

YORU: Animal behavior detection with object-based approach for real-time closed-loop feedback

science.org·1d

🧠Deep Learning

A History of Large Language Models

gregorygundersen.com·17h

🔄Sequence-to-Sequence Models

The 4 Flash Attention Variants: How to Train Transformers 10× Longer Without Running Out of Memory

pub.towardsai.net

·4d

👁️Attention Mechanisms

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·1h

🧠Neural Network Architectures

Gibbs Measures from Deep Shaped Multilayer Perceptrons

link.aps.org·4h

🧠Deep Learning

Training-Free Real-Time Control for Autoregressive Video Generation

daydream.live·3h·

Discuss: Hacker News

🎲Synthetic Data Generation

Digitizing the "Shokunin": How we encoded a Master's hammer strike into AI

yusukekaizen.substack.com·11h·

Discuss: Substack

Transformer-Based Memory Forecasting: Leveraging Anonymized Aggregates for Personal Insights

novice.media·20h·

Discuss: Hacker News

🔄LSTM Networks

An assistive robot learns to set and clear the table by observing humans

techxplore.com·19h

Loading more...